Introduction: Survival analysis evaluates data on the time to occurrence of a specific event of interest (e.g., time to death). This analysis allows to quantify and compare the effect of treatment groups on risk over a certain period of time. Commonly, survival analyses are based on survival curves and linear regression modeling. Survival curves show time-to-event data or the probability of surviving past each time point, while linear regression models predict the difference in mortality risk between groups while also controlling for the other variables of interest. As linear mixed-effects models (Module M.5), these linear regressions can be extended to mixed-effects when the data structure is nested.

In this module, you will perform survival analyses based on Kaplan-Meir methods and Cox regression models, while testing associations between early life adversity and the adult survival of Cayo Santiago rhesus macaques.



Upon completion of this module, you will be able to:


References:


Extra training:


Associated literature:


Expand for R Notation and functions index

Notation:


Functions:




Testing associations between early life adversity and adult lifespan


Life history theory posits that the early life environment influences an individual’s fate during adulthood. Early life adversity can have far reaching, negative health consequences later in life that can lead to a reduced lifespan. However, such long term studies are difficult to perform in humans. In order to improve intervention programs that promote the quality of life of vulnerable individuals, it is important to investigate the effects the early life environment can have on lifespan from a comparative perspective. The Cayo Santiago monkeys therefore provide an excellent opportunity to study associations between early life adversity (e.g., hurricanes; Fig 1) and adult lifespan.


**Fig 1**. Cayo Santiago during the aftermath of hurricane Maria (left) and rhesus macaque adult male resting on a dead tree (right). Photo by Raisa Hernández-Pacheco.

Fig 1. Cayo Santiago during the aftermath of hurricane Maria (left) and rhesus macaque adult male resting on a dead tree (right). Photo by Raisa Hernández-Pacheco.


In this module, you will use a subset of Cayo Santiago rhesus macaque demographic data and information regarding their early life environment (cayo_demo_ela; Table 1) to explore whether early life adversity is associated to adult male lifespan. Here, the early life period is defined as the time between birth and 3 years of age. This is authentic demographic data collected through regular visual censuses performed by the staff of the Caribbean Primate Research Center (CPRC) of the University of Puerto Rico-Medical Sciences Campus. Early life adversity information was estimated retrospectively for each individual from demographic and historic data.

Before coding, explore the data in Table 1.


Table 1: Cayo Santiago rhesus macaque male demographic and early life adversity data

Metadata of cayo_demo_ela: Demographic and early life environment data of Cayo Santiago rhesus macaque adult males.



1. Exploring data structure and performing quality check

Before any analysis, you should check the data and understand its attributes. Refer to Module M.1 to check the data structure. Practice your coding first before clicking on the answer!


Guiding questions:

  1. What is the range of ages in the data? Why do you think that is?
Click for Answer!
# age range
range(cayo_demo_ela$age)

The range of ages is from 3.003 to 27.038 years. This is because the data belongs to adult males (\(\geq\) 3 years of age). Thus, all individuals in this data survived into adulthood.


  1. What is the range of birth seasons?
Click for Answer!
# birth season range
range(cayo_demo_ela$season)

Birth seasons range between 1973-2018.


  1. What is the sample size for those that experienced adversities?
Click for Answer!
# summary table for variable 'sibling'
table(cayo_demo_ela$sibling)

# summary table for variable 'hurricane'
table(cayo_demo_ela$hurricane)

Out of 2,142 adult males, 1,520 have a competing younger sibling and 550 experienced a major hurricane.


2. Estimating survival curves using the Kaplan-Meir method

The Kaplan-Meir method estimates survival curves, and thus helps to visualize differences in the time to death between two groups. This method is able to work with censored data or incomplete data from an individual who dropout from the study prematurely, and thus has an unknown fate (right censorship). Although incomplete, right censored data provides valuable information that lets researchers know that the event (such as death) did not occur prior to dropping out of the study, regardless of the group.

Below, you will explore associations between early life adversity and survival using survival curves.


Guiding questions:

  1. Explore the available data, what variables do you need to estimate survival curves across groups?
Click for Answer!
# variables
names(cayo_demo_ela)

As survival curves show time-to-event data or the probability of surviving past each time point, you need information on the status of individuals (variable death) and information on the age at such status (variable age). Because you are interested in evaluating associations between survival and early life adversity, for each adversity you can fit a curve for individuals experiencing the adversity early in life (treatment) versus those who did not experience it (control).


To fit survival curves, you will use R package survival and its functions survfit() and Surv(). The function survfit() is used to create a survival curve and Surv() is used to create a survival object based on status and time-to-event information, otherwise known as the response variable. As in linear model arguments, the response variable is followed by the explanatory variable (separated by “~”). Below, you will fit survival curves for males experiencing a major hurricane early in life and for those who did not.

# installing survival
install.packages("survival",repos="http://cran.us.r-project.org")

# loading survival
library(survival)

# survival curves across hurricane groups 
hurr_curve <- survfit(Surv(age, death) ~ hurricane, data=cayo_demo_ela)

# summary
hurr_curve

Analysis output: For each group (hurricane=0, hurricane=1), the summary provides the number at risk (n), the number of events (events; i.e., deaths), the median time to the event (median; i.e., median age at death), and the 95% confidence intervals of the time to event.


To visualize the survival curve, you will use the R package survminer and its function ggsurvplot().

# installing survminer
install.packages("survminer",repos="http://cran.us.r-project.org")

# loading survminer
library(survminer)

# survival curve
s1 <- ggsurvplot(hurr_curve, data=cayo_demo_ela) 

s1


Package survminer employs ggplot2 functions (Refer to Module M.3). Below are a few recommendations for better visualization.

# survival curve
s2 <- ggsurvplot(hurr_curve, data=cayo_demo_ela, 
                 conf.int = TRUE, # to plot 95% CI
                 censor = FALSE, # to not plot censorship events
                 risk.table =TRUE, # to include a risk table
                 palette = c("grey66","darkorange2"), # to change colors
                 pval = TRUE, # to include a p-value for the log-rank test
                 # other aesthetics:
                 legend=c(.9,.9), legend.labs=c("Non-hurricane","Hurricane"), legend.title="",font.legend=9,        
                 xlab=" Age (years)")
s2

Survival curve interpretation: A stair like “drop” in the survival curve with time indicates an event occurred for a portion of individuals at risk. Notice how the survival probability is at 100% during the first three years of life. In this case, that is correct given that the available data is from males that survived into adulthood. Given the differences in males at risk between the two groups, the 95% CI are larger in the hurricane group. Another important feature in this visualization is the p-value. This p-value is from a log-rank test testing if the two survival curves are statistically different from each other. Under the null hypothesis, there is no difference between the two survival curves.


Guiding questions:

  1. How does the survival at 10 years of age compare between the two groups of males?
Click for Answer!

At 10 years of age, males experiencing a major hurricane early in life have a ~75% chance of survival while males who did not experience a hurricane have a ~70% chance of surviving.

  1. According to your analysis, are these two survival curves significantly different from each other?
Click for Answer!

No. The p-value is > 0.05, thus you fail to reject the null hypothesis. According to the analysis, males experiencing a hurricane event early in life do not show a different survival curve from those who did not experience it.


  1. Using the number at risk, what could be a reason for the changes in sample size?
Click for Answer!

The hurricane group looses all males before reaching 25 years of age. However, the analysis indicate no statistical difference between the survival of the two groups. Thus, the difference in the number at risk could be due to right censorship.


Challenge!

Use the Kaplan-Meir method to test for differences in survival between males experiencing a competing younger sibling early in life relative to those not experiencing it.


3. Fitting Cox-Proportional Hazard regression models.

In many situations, the survival of individuals is influenced by more than one variable at the same time. It could also be the case that the variables of interest are continuous variables, and not groups. In such case, survival analysis can be performed using the Cox Proportional Hazards Regression model to predict a hazard ratio associated to each exploratory variable. Hazard ratios (HR) give you information about the difference in mortality risk between groups (categorical variable) or as the value of the explanatory variable increases (continuous variable).

Below, you will fit a Cox Proportional Hazards regression model with mixed effects to the rhesus macaque data to test if hurricanes during early life are associated to an increased mortality risk among adults. For this, you will use R package coxme and its function coxme(). The model is coded as a linear mixed-effects model, and thus you will need to specify the structure of the random effects (refer to Module M.7). In this case, you will add a random intercept for mom ID to account for potential maternal effects. To see the model output, you will use summary().

# installing coxme
install.packages("coxme",repos="http://cran.us.r-project.org")

# loading coxme
library(coxme)

# cox mixed-effects model including fixed effect hurricane and mom as random intercept
gm1 <- coxme(Surv(age, death) ~ hurricane + (1|mom), data = cayo_demo_ela) # (1|mom) to only include a random intercept for mom

# model output
summary(gm1)

Model output interpretation. The model output contains three main sections: a summary of events and penalized partial likelihood test, summary of the model formula and fixed coefficients, and summary of random effects. When looking at the coefficients (coef), the value for hurricanes is negative. This suggests that mortality risk is reduced for males experiencing a hurricane early in life, relative to males who did not experience it. To obtain the hazard ratio you need to exponentiate the coefficient (exp(coef)). According to this analysis, the hazard ratio is 0.934 or adult males experiencing a hurricane early in life show a 6.6% reduction in mortality risk at any given age (1 - 0.934 = 0.066), relative to males who did not experienced a hurricane. However, this observation is likely chance because the p-value < 0.05.


4. Checking Cox model assumptions

The key assumption in Cox regression models is that the hazard associated to an explanatory variable remains constant over time (proportionality). Below, you will test model assumptions. For this, you can use R packages survival and its function cox.zph() to test if your Cox model meets the assumption of proportionality. A p-value > 0.05 is interpreted as the model meeting the assumption.

# testing model assumptions
cox.zph(gm1)

The p-value < 0.05. This means that the hazard associated to the variable hurricane is not constant over time, and thus your Cox model violates its assumption of proportionality.


5. Addressing model violations and performing timecuts

One way of addressing the fact that the variable hurricane does not meet the proportional hazards assumption (i.e., the relation between the adversity effect and the time to death is not constant over time) is extending the Cox analysis to time-varying covariates. This is done by stratifying the variable hurricane into different age periods that meet the assumption. To determine when (at what age) to stratify hurricane, you will do a visual inspection of the estimated coefficient over time.

Below, you will plot the estimated coefficient of the effect from hurricane over time using the functions plot() and cox.zph() from packgae survival.

# plotting hurricane coefficient over time (age)
plot(cox.zph(gm1)[1],resid = FALSE) #[1] to indicate the first coefficient of the model gm1
abline(0,0,lty=3,lwd=2) # reference line for null effect

Plot interpretation: If the effect of hurricane was constant over time (as assumed), the plotted line would also be constant over time, which is not the case. Here, 0 represents the null effect (i.e., no effect of hurricane on hazard), thus it seems that at young adult ages hurricane reduces the hazard (negative coefficient) but at older adult ages hurricane increases the hazard (positive coefficient). After inspecting the coefficient over time, stratifying the variable at time 8 years when the sign of the effect changes is recommended.

Below, you will stratify the variable hurricane by performing a time cut at 8 years of age. For this, you will incorporate a step-function, or a covariate * time interaction. The covariate * time interaction splits the time into different intervals such that the model is stratified in the different intervals. To create a data frame with a time split, you will need the function survSplit() in the package survival. This function has the arguments cut which is used to identify where the analysis time will be split, and episode= which creates a column that contains information regarding where an individual falls in the time split.

# creating newtime column 
cayo_demo_ela$newtime <- cayo_demo_ela$age

# creating data frame with time split at 8
cayo_demo_ela_split <- survSplit(Surv(newtime, death) ~ ., data= cayo_demo_ela, cut=c(8), episode ="hurricane_timegroup") 


# checking the new dataset
head(cayo_demo_ela_split)

New dataset interpretation: You just split the analysis time at 8 years. Say an individual died at age 5 years, then this individual will have a tstart at 0, and a 1 in the episode column because this individual falls in the first time interval (age at death or censorship is < 8). If an individual died at age 12.2 years, then this individual will now have 2 rows of information; the first row will categorize this individual with a tstart at 0, a 1 in the episode column because this individual falls in the first time interval, and a 0 in the death column because this individual did not die before age 8. The second row will categorize this same individual with a tstart at 8, a 2 in the episode column because this individual now falls in the second time interval (age is > 8) and will have a 1 in the death column because this individual died at age 12.2


Below, you will extend your analysis to a Cox model with time-varying covariates and mixed effects and test whether this new model meets the assumtpion of proportionality. This new model incorporates the the new defined time and the covariate * time interaction using the function strata().

# time varying Cox model 
gm2 <- coxme(Surv(tstart,newtime, death) ~ hurricane*strata(hurricane_timegroup) + (1|mom), data =cayo_demo_ela_split) 

# model output
summary(gm2)

Model output interpretation. The model output can be interpreted as before. The only difference is that now you have two coefficients; one for hurricane effects until 8 years of age and another for effects after 8 years of age.


Guiding questions:

  1. Can you interpret the model output?
Click for Answer!

Major hurricanes early in life influence the survival of adult male rhesus macaques at Cayo Santiago. Such experience is associated to a reduction in mortality risk by 38% until 8 years of age, relative to males who did not experience a hurricane (hazard ratio = 0.62). After 8 years, males who experienced a hurricane early in life were as twice as likely to die (hazard ratio = 2.28), than those who did not experience it.


Finally, you will test the assumptions of proportionality as you did before.

# testing model assumptions
cox.zph(gm2)


Guiding questions:

  1. Does the new model meet the assumption of proportionality?
Click for Answer!

Yes! The p-values for both time periods are laergr than 0.05. Yay!


Challenge!

  1. Build a new Cox model with mixed effects to test associations between a competing younger sibling and experiencing a hurricane and adult survival.

  2. Check the model for violations.


Discussion questions:

  1. Are there significant associations between early life adversities and adult survival in male rhesus macaques.

  2. Did you need to build a model with time-varying covariates? Explain.


FIN



Acknowledgements: The creation of this module was funded by the National Science Foundation DBI BRC-BIO award 2217812. Additional support was provided by the National Institutes of Health to Stephanie J. Gonzalez, award T32GM138075. Cayo Santiago is supported by the Office of Research Infrastructure Programs (ORIP) of the National Institutes of Health, grant 2 P40 OD012217, and the University of Puerto Rico (UPR), Medical Sciences Campus.


Authors: Stephanie J. Gonzalez, Raisa Hernández-Pacheco, California State University, Long Beach